Social Media Analytics: The Kosmix Story

نویسندگان

  • Xiaoyong Chai
  • Omkar Deshpande
  • Nikesh Garera
  • Abhishek Gattani
  • Wang Lam
  • Digvijay S. Lamba
  • Lu Liu
  • Mitul Tiwari
  • Michel Tourn
  • Zoheb Vacheri
  • STS Prasad
  • Sri Subramaniam
  • Venky Harinarayan
  • Anand Rajaraman
  • Adel Ardalan
  • Sanjib Das
  • Paul Suganthan G. C.
  • AnHai Doan
چکیده

Kosmix was a Silicon Valley startup founded in 2005 by Anand Rajaraman and Venky Harinarayan. Initially targeting Deep Web search, in early 2010 Kosmix shifted its main focus to social media, and built a large infrastructure to perform social media analytics, for a variety of real-world applications. In 2011 Kosmix was acquired by Walmart and converted into @WalmartLabs, the advanced research and development arm of Walmart. The goals of the acquisition were to provide a core of technical people in the Valley and attract more, to help improve traditional e-commerce for Walmart, and to explore the future of e-commerce. This future looks increasingly social, mobile, and local. Accordingly, @WalmartLabs continues to develop the social media analytics infrastructure pioneered by Kosmix, and uses it to explore a range of social e-commerce applications. In this paper we describe social media analytics, as carried out at Kosmix. While our framework can handle many types of social media data, for concreteness we will focus mostly on tweets. Section 2 describes the analytics architecture, the applications, and the challenges. We describe in particular the Social Genome, a large real-time social knowledge base that lied at the heart of Kosmix and powered most of its applications. Section 3 describes how the Social Genome was built, using Wikipedia, a set of other data sources, and social media data. Section 4 describes how we classify and tag tweets, and extract entities from tweets and link them to a knowledge base. Section 5 describes how we detect and monitor events in the Twittersphere. Section 6 discusses how we process the high-speed Twitter stream using Muppet, a scalable distributed stream processing engine built in house [1]. Section 7 discusses lessons learned and related work, and Section 8 concludes. Parts of the work described here have been open sourced [1] and described in detail in recent papers [18, 23, 25, 32].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Muppet: MapReduce-Style Processing of Fast Data

MapReduce has emerged as a popular method to process big data. In the past few years, however, not just big data, but fast data has also exploded in volume and availability. Examples of such data include sensor data streams, the Twitter Firehose, and Facebook updates. Numerous applications must process fast data. Can we provide a MapReduce-style framework so that developers can quickly write su...

متن کامل

Entity Extraction, Linking, Classification, and Tagging for Social Media: A Wikipedia-Based Approach

Many applications that process social data, such as tweets, must extract entities from tweets (e.g., “Obama” and “Hawaii” in “Obama went to Hawaii”), link them to entities in a knowledge base (e.g., Wikipedia), classify tweets into a set of predefined topics, and assign descriptive tags to tweets. Few solutions exist today to solve these problems for social data, and they are limited in importa...

متن کامل

User-Generated Content in Social Media

This report documents the program and the outcomes of Dagstuhl Seminar 17301 “User-Generated Content in Social Media”. Social media have a profound impact on individuals, businesses, and society. As users post vast amounts of text and multimedia content every minute, the analysis of this user generated content (UGC) can offer insights to individual and societal concerns and could be beneficial ...

متن کامل

Social Media Visual Analytics for Emergency Management: A Systematic Mapping

Social media visual analytics are becoming important in helping emergency managers gain situation awareness and make better decisions. In this paper, we present a systematic mapping to understand how the field is structured, find out what research topics exists in social media visual analytics for emergencies, and understand what the visual analytics application categories in this area are. Thi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Data Eng. Bull.

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2013